Primed and ready: nanopore metabarcoding can now recover highly accurate consensus barcodes that are generally indel-free

Background DNA metabarcoding applies high-throughput sequencing approaches to generate numerous DNA barcodes from mixed sample pools for mass species identification and community characterisation. To date, however, most metabarcoding studies employ second-generation sequencing platforms like Illumina, which are limited by short read lengths and longer turnaround times. While third-generation platforms such as the MinION (Oxford Nanopore Technologies) can sequence longer reads and even in real-time, application of these platforms for metabarcoding has remained limited possibly due to the relatively high read error rates as well as the paucity of specialised software for processing such reads. Results We show that this is no longer the case by performing nanopore-based, cytochrome c oxidase subunit I (COI) metabarcoding on 34 zooplankton bulk samples, and benchmarking the results against conventional Illumina MiSeq sequencing. Nanopore R10.3 sequencing chemistry and super accurate (SUP) basecalling model reduced raw read error rates to ~ 4%, and consensus calling with amplicon_sorter (without further error correction) generated metabarcodes that were ≤ 1% erroneous. Although Illumina recovered a higher number of molecular operational taxonomic units (MOTUs) than nanopore sequencing (589 vs. 471), we found no significant differences in the zooplankton communities inferred between the sequencing platforms. Importantly, 406 of 444 (91.4%) shared MOTUs between Illumina and nanopore were also found to be free of indel errors, and 85% of the zooplankton richness could be recovered after just 12–15 h of sequencing. Conclusion Our results demonstrate that nanopore sequencing can generate metabarcodes with Illumina-like accuracy, and we are the first study to show that nanopore metabarcodes are almost always indel-free. We also show that nanopore metabarcoding is viable for characterising species-rich communities rapidly, and that the same ecological conclusions can be obtained regardless of the sequencing platform used. Collectively, our study inspires confidence in nanopore sequencing and paves the way for greater utilisation of nanopore technology in various metabarcoding applications. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10767-4.


Background
DNA metabarcoding refers to the high-throughput sequencing of total (and sometimes degraded) DNA from bulk or environmental samples (e.g., air, water, soil, faeces, etc.) with the goal of multispecies identification [1].It was built upon the DNA barcoding paradigm that has been established for about two decades involving the sequencing of short segments of DNA (termed "barcodes") and matching them to sequence databases to obtain species identities [2].DNA metabarcoding emerged in the 2010s, and was primarily made possible due to rapid advancements in nucleic acid sequencing technologies-with "next-generation sequencing" (NGS) platforms-which have the ability to generate billions of sequence reads in a single experiment [3].This development has been groundbreaking due to the sheer ability of NGS platforms to generate sequence reads (i.e., DNA barcodes) in parallel, so multispecies detections and identification from various sample types are now possible.This has led to a meteoric rise in the number of studies that have since performed NGS-based barcoding or metabarcoding for various applications.For instance, 60% of DNA sequencing studies in marine science published yearly between 2013 and mid-2022 generated their sequence reads with Illumina [4].The release of the Min-ION in 2014 by Oxford Nanopore Technologies (ONT) became another significant milestone in nucleic acid sequencing for several reasons: [1] its lower entry and per-base sequencing cost (2,000 USD for the entry starter pack) [2], its ability to perform long-read sequencing (now up to ~ 4 Mb long) [3], its compact size and portability, and [4] its ability to generate data in real-time [5,6].All these were perhaps a direct response to common criticisms of Illumina sequencing, which is comparatively more expensive, and limited by its short read-lengths (up to ~ 500 bp).Since then, nanopore sequencing has been applied in numerous whole-genome sequencing studies [7][8][9][10] and metagenomic studies [11,12].
We posit that the general lack of nanopore-based metabarcoding studies can be attributed to two main factors.The first is the perception that nanopore reads are highly erroneous.This is unsurprising given that early studies have reported error rates of ~ 20% [30] to as high as 38% [31].In contrast, the current error rate of Illumina sequencing is only 0.24% [32].There is thus concern that the high error rates would hinder accurate species identification in DNA metabarcoding.The second factor could be the lack of programs to process nanopore reads for metabarcoding (but see below), compared to the plethora of pipelines catered to short-read sequencing, like APSCALE [33], DADA2 [34], eDNAflow [35], or OBI-Tools [36].DADA2 currently supports PacBio circular consensus sequencing but not nanopore reads [37], and even ONT's own EPI2ME platform is intended for microbial sequencing only.Nanopore-specific workflows like ONTrack [38], NGSpeciesID [39] and miniBarcoder [40,41] were designed mainly for DNA barcoding, although Davidov et al. [13] have successfully applied ONTrack to process their metabarcoding reads.Prior metabarcoding studies have worked around the lack of specialised software by either: (i) conducting BLAST searches of raw nanopore reads with stricter e-value settings as low as 1e − 40 to minimise erroneous matches due to chance [21,26], (ii) using custom reference databases for mapping and processing reads [23], or (iii) using existing programs designed for short reads, like VSEARCH [42] or CD-HIT [43] with more relaxed settings for clustering error-prone nanopore reads [28,44].
We expect that nanopore metabarcoding studies will become more common, given the release of new nanopore metabarcoding workflows like ASHURE [20], decona [45] and MSI [27], its real-time sequencing capabilities, as well as improvement in flow cell chemistries and base calling models over time.The latter is evidenced in the decreasing raw read error rate to ~ 6% using R9.4 flow cell chemistry [46], and even lower at ~ 4% for R10.3 flow cells [47].Two research groups have since independently confirmed that it is possible to generate highlyaccurate, Illumina-like, DNA barcodes without further need for error correction with R10.3 sequencing chemistry [48,49].As of writing, raw read accuracy is now ~ 99% with the latest R10.4.1 sequencing chemistry and base calling models (see https://rrwick.github.io/for more up-to-date information).
In light of these improvements in sequencing accuracy, we propose that the time is ripe for broader-scale nanopore metabarcoding, and on more complex biological communities.In this study, we performed mitochondrial cytochrome c oxidase subunit I (COI) metabarcoding on species-rich, bulk zooplankton samples collected from the tropical waters of Singapore.We then benchmarked the relative abundance and community composition of molecular operational taxonomic units (MOTUs) obtained from nanopore sequencing against Illumina sequencing-the current gold standard for metabarcoding sequencing-to investigate if the sequencing platform affects community characterisation of zooplankton communities.We show that processing nanopore reads with available programs like amplicon_sorter [48] produces highly-accurate consensus metabarcodes that are Illumina-like in accuracy.To the best of our knowledge, this is the first study to demonstrate that nanopore consensus metabarcodes are almost always indel-free, even with R10.3 chemistry.This is also an advancement over existing workflows that incorporate clustering and subsequent polishing steps as these sequences would still retain indel errors, thereby reducing confidence in their quality.We further demonstrate that such high-quality metabarcodes can be obtained without the need for complicated wetlaboratory procedures like rolling circle amplification as with the ASHURE workflow, or even error correction programs, like in the MSI and decona pipelines.Moreover, we were able to recover ~ 85% of zooplankton richness with 12-15 h of sequencing run time.Our study demonstrates the viability of nanopore metabarcoding for analysing complex, biodiverse communities, and we hope this inspires greater confidence in nanopore sequencing for a greater variety of metabarcoding applications.

Sample collection and processing
The study samples comprised a series of zooplankton collections made during August-September 2020 in Singapore.Collections were permitted by the National Parks Board, Singapore (Permit Number NP/RP18-051).The targeted sites were off Pulau Hantu and Sisters' Islands in the Singapore Strait (See Supplementary File S1 for GPS coordinates).All plankton collections were performed at night (1800-2200 h), and sampling was conducted in two ways.First, triplicate oblique plankton tows were performed from a boat with bongo nets (2 m in length, 500 μm mesh size, 50 cm ring diameter) from a depth of 15 m to the surface at 1 m/s.The plankton net was always rinsed with fresh water before each tow, and its contents were collected as the field negative control.After each tow, the contents from one cod-end were poured through 2 mm and 500 μm sieves to filter excess seawater before bulk preservation in molecular-grade ethanol [50].Specimens larger than 1 cm were picked out individually.The collections were thus separated into three size fractions-1 cm, 2 mm and 500 μm.Second, a quatrefoil light trap (30 cm diameter by 25 cm height; 5 mm entry slit width) fitted with two GT-AAAs (Glo-Toob) was left at the jetty of each island 1.5 m below the water surface for two hours (See Supplementary File S1 for GPS coordinates).Light trap samples were processed in the same way as bongo net samples.All bulk samples were brought back to the laboratory and stored at -20 °C prior to DNA extraction.

DNA extraction and PCR amplification
Bulk samples were first ground with pre-sterilized mortar and pestles.Genomic extraction was performed with DNeasy Blood and Tissue Kit (Qiagen) following the manufacturer's protocol, except that genomic DNA was eluted in nuclease-free water.To prevent cross-contamination, a fresh set of autoclaved mortar and pestle was used for each tow/light trap.All units were thoroughly washed and autoclaved before the next set of DNA extractions.
We amplified the 313-bp fragment of mitochondrial COI for direct comparison of PCR products across shortand long-read platforms.PCR amplification was performed using the mlCOIintF: 5'-GGW ACW GGW TGA ACW GTW TAY CCY CC-3' [51] and LoboR1: 5'-TAA ACY TCW GGR TGW CCR AAR AAY CA-3' [52] primer combination.This primer combination was also chosen for its high amplification success in marine organisms [53][54][55][56], and is approximately four times cheaper than the conventional mlCOIintF and jgHCO2198 [57] metabarcoding primer pair [28,58].Furthermore, Yeo et al. [59] have also demonstrated that 313-bp COI sequences performed just as well as 658-bp barcodes for species-level identification.The primers were tagged at the 5' end with custom 13-bp sequences (i.e., "tags") from Srivathsan et al. [41] to allow for downstream demultiplexing of sequence reads to samples.The longer-thanusual tag lengths were necessary to accommodate the error profile of Kit 9 and R10.3 sequencing chemistry [41] (though it was recently reported that shorter 9-bp tags work well for R10.4.1 sequencing kits and flow cells [60]).Each PCR was assigned its own unique forward and reverse tag combination where possible, and if there were overlapping tag combinations, we separated them into different library pools (i.e., Plate A and B).
PCR was carried out in 25 µl triplicate reactions using 2 µl genomic DNA (100× dilution of original extract), 12.5 µl of GoTaq Green Master Mix (Promega), 2 µl of 10 µM 13-bp tagged forward and reverse primers, 1 µl of bovine serum albumin (1 mg/ml; New England Biolabs) and 7.5 µl of nuclease-free water.A step-up thermocycling profile was used: 1 min denaturation at 94 °C; 5 cycles of 30 s at 94 °C; 2 min at 45 °C; 1 min at 72 °C; 30 cycles of 30 s at 94 °C; 2 min at 55 °C; 1 min at 72 °C and a final extension of 3 min at 72 °C.All PCR products were screened on 2% agarose gels stained with GelRed (Biotium Inc.) to ensure appropriate amplification.PCR amplicons were subsequently combined by plate into two pools and purified with SureClean Plus (Bioline).Plate A and B had 48 and 72 amplicons (including negatives and controls) respectively.In total, 34 samples, four field controls, and two PCR negatives were carried forward for Illumina and nanopore library preparation (40 ✕ 3 PCRs = 120 amplicons).

Illumina metabarcoding and bioinformatics
We prepared two Illumina libraries using NEBNext Ultra II DNA Library Prep Kit (New England Biolabs) following the manufacturer's protocol, up till the adapter ligation step (i.e., PCR-free libraries).Libraries were multiplexed using TruSeq CD Dual Indexes (Illumina).Cleanups were performed using 1.0× AMPure XP beads (Beckman Coulter).The two libraries were pooled together and outsourced for sequencing on a single Illumina MiSeq (2✕250-bp) lane at the Genome Institute of Singapore.
Illumina reads were processed according to a modified metabarcoding pipeline from Sze et al. [61] and Ip et al. [62].First, Illumina paired-end reads were merged using PEAR v0.9.6 [63].Thereafter, OBITools v1.2.13 [36] was used for downstream processing of assembled reads.Specifically, the ngsfilter module was used to demultiplex reads to respective PCR replicates under default settings, where up to 2-bp mismatch was allowed for primer sequences, but no mismatch allowed for tag sequences.Sequence reads were then dereplicated and sorted to samples with obiuniq and obisubset respectively.We retained sequences with ≥ 5 counts and between 303-and 323-bp in length using obigrep.Subsequently, the filtered reads were further collapsed with obiclean, where sequences with 1-bp difference from each other were considered sequencing errors and further collapsed, and only reads with 'head' status were retained.We then concatenated all sequences across all samples, and ran cdhit-est v.4.8.1 [43] to collapse 100% identical sequences.Any sequence that clustered with PCR negatives or control samples at 100% were eliminated.

Nanopore metabarcoding and bioinformatics
The same cleaned amplicon pools were used to prepare two nanopore libraries with the Ligation Sequencing Kit (SQK-LSK109) following the manufacturer's protocol, but end-repair and adapter ligation times were increased to 60 and 15 min respectively [58].Cleanups were likewise done using 0.9× AMPure XP beads (Beckman Coulter) and the supplied Short Fragment Buffer (SFB).Finally, the two libraries were each sequenced on fresh R10.3 MinION flow cells on MinKNOW v20.10.3 for Ubuntu 16.The R10.3 flow cell chemistry was selected given its improved accuracy and homopolymer resolution [49,64].RUN A lasted 20 h and 30 min, while RUN B lasted 41 h.Raw fast5 reads were exported to the National University of Singapore's High Performance Computing Volta cluster for GPU basecalling on NVIDIA Tesla V100 SXM2 32GB with Guppy v5.0.14 + 8f53ee9, using the super accurate (SUP) model at default settings.We then performed a length filter with NanoFilt v2.8.0 [65] to retain only sequence reads ≥ 250-bp in length.Subsequently, the sequences were distributed to respective PCR replicates with the demultiplexing module of ONTbarcoder v0.1.9[49].We set 313-bp as the read length threshold, and kept the other settings as default.Only sequences deviating up to 2-bp from the tag sequence were accepted in the demultiplexing process, which was possible as tags were designed to differ by ≥ 3-bp from each other [41].Moreover, ONTbarcoder recognises and splits self-ligated reads during demultiplexing, thereby retaining more reads for downstream analysis.Thereafter, we concatenated the reads by sample.
For metabarcoding analysis, we used the amplicon_ sorter v2022-03-28 [48] to sort and group the nanopore reads based on length and sequence similarity in order to generate consensus metabarcodes.We selected it for three reasons.First, amplicon_sorter performs referencefree clustering which is extremely useful in our case since we did not have a priori knowledge of the community composition of our zooplankton samples.Second, ampli-con_sorter considers all possible clusters when generating consensus sequences, meaning it can be utilised to analyse DNA metabarcoding data.Third, amplicon_ sorter corrects for indel errors when calling the majority consensus, thereby generating Illumina-like quality metabarcodes that will almost always be indel-free.This was unachievable with our prior tests of the same dataset using VSEARCH and CD-HIT, and subsequent polishing with RACON [66] and medaka (https://github.com/nanoporetech/medaka), and most nanopore metabarcodes still contained indel-errors after polishing (data not shown).
We adopted a conservative approach where sequences were added into a species group by amplicon_sorter only if they were ≥ 97% similar (--similar_species), and consensus sequences were combined together only if they were ≥ 98% similar (--similar_consensus).We also set the minimum and maximum length limits to 293-and 333-bp respectively, and performed 3× random sampling (--maxreads) to increase likelihood of sampling rare reads.We then mapped the sequences of each cluster back to the respective consensus sequence with minimap2 v2.24 [67] and polished the consensus with medaka v1.7.2, using the r103_sup_g507 model.Finally, we removed sequences that were present in our PCR negatives and controls from the samples using the same method described for Illumina metabarcoding.
With the final consolidated MOTU dataset, we assessed if and how MOTU communities compared between sequencing types quantitatively using diversity metrics, PERMANOVA, and qualitatively by examining the agreement in MOTU composition in terms of proportion and abundance.All statistical analyses were performed in R v4.3.1 [73], in RStudio (build 2023.03.0) unless otherwise stated, and all relevant plots were generated with the ggplot2 v3.4.2 package [74].We computed the MOTU richness, Shannon-Wiener, and Simpson indices for each sequencing dataset using the diversity function in vegan v2.6-4 [75] and ran a paired, nonparametric Wilcoxon signed-rank test to test whether differences in the indices were due to different sequencing platforms.We also plotted the rarefaction curves of MOTU richness for each dataset with iNEXT v3.0.0 [76] to examine the relationship between MOTU richness and sampling depth.Community similarities between sequencing types were assessed using: (i) the Jaccard similarity coefficient by converting the MOTU community matrix to binary absence/presence data; and (ii) also with Bray-Curtis distances, where we normalised our MOTUs by relative abundance of sequencing reads [77].We visualised the distances using nMDS plots (metaMDS in vegan) and heatmaps constructed with pheatmap v1.0.12 package [78].We also performed PERMANOVA with adonis2 in vegan to test for community differences between Illumina and nanopore sequencing.Here, sequencing type (Illumina or nanopore) was included as a variable, in addition to site (Pulau Hantu or Sisters' Islands), date (5 August 2020, 19 August 2020, 20 August 2020, 2 September 2020, 3 September 2020 or 16 September 2020), as well as fraction (1 cm, 2-500 μm).We first verified that each variable had a non-significant betadisper result before inclusion into PERMANOVA.We also analysed the datasets separately to confirm the same ecological conclusions would be obtained regardless of sequencing type.For this, we used the same Bray-Curtis distance datasets, and visualised the community dissimilarities with nMDS.For PERMANOVA, we only incorporated the bongo net samples as that sampling method had the most samples.We used the same three variables (site, date, fraction) and groupings as above for PERMANOVA with adonis2.
We also examined MOTU community compositions to determine how consistent they were between nanopore and Illumina platforms.We first looked at MOTU composition based on phyla, and compared the relative proportions of each phylum at the sequencing dataset level, and further at the sample level.In addition, we were also interested to know if a MOTU that was abundant in nanopore sequencing would be similarly so with Illumina sequencing.For each sample, we sorted and ranked the MOTUs by sequencing reads, and then assessed similarity in rank order of MOTUs between sequencing platforms with Kendall rank correlation coefficient (Kendall's τ) [79].We performed the correlation analysis only for 31 out of 34 samples as the remaining three samples had only one pairwise comparison.

Sequencing accuracy and quality of nanopore reads
A known drawback of nanopore sequencing is its relatively high error rates.A close examination of the error rates of the raw reads and consensus sequences here was thus necessary to allay existing concerns regarding its use.We mapped the nanopore sequences against the cleaned Illumina sequences at the sample-level (e.g., ZPT005 nanopore reads to ZPT005 Illumina reads) with mapPacBio.shv38.96 in BBTools (script was also recommended for nanopore data; https://sourceforge.net/projects/bbmap/).We maximised mapping sensitivity with the --vslow flag, and mapped two datasets: (i) the demultiplexed reads from ONTbarcoder to estimate raw read error rates and (ii) consensus sequences generated from amplicon_sorter to assess consensus sequence quality.We only considered mappings where the nanopore queries had ≥ 90% identity match to the Illumina reference sequences, and computed the total error rates, which took into account substitutions, insertions, deletions and ambiguous bases.
Additionally, for each MOTU shared between Illumina and nanopore datasets, we further compared the constituent Illumina and nanopore member sequences of that MOTU with dnadiff v1.3 [80].As our Illumina sequences were already confirmed to be translatable, and are thus free of frameshift errors and unlikely NUMTs, this comparison allowed us to assess the frequency of indel errors in our nanopore consensus sequences.

Time sampling of nanopore reads
Given the real-time sequencing properties of the Min-ION, we also preliminarily examined the relationship between sequencing run time and its effect on the nanopore metabarcoding.It was previously observed that 80-90% of DNA barcodes were obtained within the first few hours of sequencing [40,49] for DNA barcoding studies.Here, we tested if the observed trends would be similar in a nanopore metabarcoding context.We subsampled the nanopore reads generated from each run for every hour for the first three hours of sequencing, followed by every three hours thereafter, until 18 h for RUN A and 39 h for RUN B. For each time period, we repeated the entire workflow from Guppy basecalling to amplicon_sorter (see section 'Nanopore metabarcoding and bioinformatics').For each time point, we noted down (i) the number of raw reads generated, (ii) the number of reads demultiplexed by ONTbarcoder, and (iii) the number of metazoan MOTUs obtained for each time series dataset.

Zooplankton collections
A total of 49 bulk zooplankton samples-24 and 25 from Pulau Hantu and Sisters' Islands respectively-were collected and included in this study (Supplementary File S1).Of the 49 samples, 37 were bongo net samples, seven were light trap samples, and five were field control samples.After sieving and sorting, the 500 μm size fraction was the most common (29 samples), followed by 2 mm (18 samples), with the 1 cm fraction class having the least (2 samples).PCR amplification was successful for 34 samples (28 bongo net and 6 light trap samples), and nanopore and Illumina libraries were prepared for a total of 40 samples for this comparative study (including four field controls and two PCR negatives).

Metabarcoding and MOTU delimitation
For Illumina sequencing, we generated 10,038,735 paired-end reads on a single Illumina MiSeq lane, 7,630,728 reads were successfully assembled with PEAR, 4,218,977 reads were successfully demultiplexed (55.3% demultiplexing success), and 4,162,498 reads remained after the length filter.Most Illumina reads dropped out at the PEAR assembly stage due to Q-score filtering, and during the demultiplexing step due to strict settings (no mismatches allowed in tags).We obtained 10,788 clean haplotypes after removing sequences present in controls and PCR negatives.
For nanopore sequencing, we generated 20,045,167 raw reads from across two MinION sequencing runs (RUN A and B).We retained 14,123,752 reads after Guppy basecalling and NanoFilt, and 6,918,618 reads after demultiplexing with ONTbarcoder (48.6% demultiplexing success).The low demultiplexing success rate is common for 13-bp tagged primers and sequencing with R10.3 chemistry [41,64,81], but will not be a cause for concern as ~60% demultiplexing success rates are obtainable with R10.4.1 chemistry [82].Consensus calling with amplicon_sorter generated a total of 4,206 sequences from 3,525,077 reads (51% of demultiplexed reads).At the sample level, 57.6% of demultiplexed reads were utilised by the program to generate consensus sequences on average, with a minimum of 47.1-73.3%maximum.The median length was 313-bp (62% of total sequences generated); minimum and maximum sequence lengths were 300-and 339-bp respectively.We also observed that amplicon_sorter very rarely generated consensus sequences from different "gene groups" (two samples had one consensus sequence each while only one sample had five such consensus sequences).These were found to be of non-mitochondrial origin when we conducted nucleotide BLAST searches on NCBI web servers, and were thus excluded from the dataset.After filtering sequences present in the negatives and controls, we retained 3,973 consensus sequences (3,295,247 reads).As polishing with medaka had a minimal impact in reducing error rates (~ 0.02% decrease), we carried out the analysis using the unpolished dataset instead (see [48]).
From the combined sequencing dataset, we obtained 1,031 molecular operational taxonomic units (MOTUs) at the 3% threshold, with only 688 identified (at 85% identity match with ≥ 250-bp overlap) via readsidentifier.We discarded 61 MOTUs (four unclassified environmental samples, 35 Rhodophyta, 10 Fungi, eight Bacillarophyta, two Phaeophyceae, one Dinophyceae, and one Oomycota).We further eliminated one Illumina MOTU for failing the translation check, and 10 MOTUs that matched non-marine Insecta.None of the remaining MOTUs' geographic ranges fell outside the Indo-Pacific.Our final dataset comprised 616 Metazoa MOTUs, of which 316 had ≥ 97% match to a sequence on NCBI nt database, and 274 out of 316 obtained a species-level identity (Supplementary File S1).

Comparing nanopore and Illumina metabarcoding
The proportion of demultiplexed reads assigned to each sample was largely consistent across both Illumina and nanopore sequencing for most samples (Fig. 1a).Illumina recovered a higher number of MOTUs (589 vs. 471) than nanopore, but species accumulation curves suggested that ~ 120 samples were needed to fully capture zooplankton diversity for both sequencing types (Fig. 1b).444 MOTUs were shared (72% overlap) across both sequencing platforms, with more MOTUs unique to Illumina than to nanopore (Fig. 1b, insert).At the sample-level, Illumina metabarcoding also consistently recovered more MOTUs than nanopore, with the exception of ZPT017 and ZPT023 (Fig. 1c).MOTU richness (p-value = 4.056 × 10 − 5 ) and Shannon-Wiener diversity (p-value = 0.03) were found to be significantly different across paired samples, while Simpson diversity was not (p-value = 0.63, Fig. 1d).Even so, we observed clustering by sample on the nonmetric multidimensional scaling (nMDS) plots, especially with the Bray-Curtis distance metric (Fig. S1).This suggested that although MOTU richness differed across paired samples, the relative abundance of MOTUs within each sample were quite similar across both sequencing platforms.Permutational multivariate analysis of variance (PERMANOVA) revealed significant differences in communities for both Jaccard and Bray-Curtis datasets (Jaccard: df = 27, F = 1.2329,R 2 = 0.4542, p = 0.0014; Bray-Curtis: df = 27, F = 1.6542,R 2 = 0.52754, p = 0.0001), but the differences were driven by the other three variables and not sequencing type (Table 1).When each sequencing dataset was analysed separately, we noted the same ecological conclusions from the nMDS plots and PERMANOVA as well-that the bongo net zooplankton communities were structured by date, fraction and site regardless of the sequencing platform (Fig. 2; Table 2).
Since MOTU richness differed between each sample's Illumina and nanopore datasets, we checked if this difference altered the respective community compositions.Both Illumina and nanopore recovered all 10 metazoan phyla, with nanopore recovering an additional singleton Platyhelminthes MOTU.Proportions of phyla were found to be consistent across both sequencing datasets, and were largely dominated by Arthropoda (~ 53%), followed by Chordata (~ 20%) and then Cnidaria (~ 12%) (Fig. 3a and Table S1).The differences in MOTU richness were largely from these three dominant groups, with Illumina recovering 1.2 to 1.3× more MOTUs from each of these three phyla compared to nanopore (Table S1).The largest disparity was in Mollusca, for which Illumina recovered twice the number of MOTUs than nanopore.For the remaining six phyla (Echinodermata, Annelida, Porifera, Chaetognatha, Ctenophora, Bryozoa), Illumina and nanopore recovered approximately the same number of MOTUs.At the sample-level, the similar phylum proportions were also consistently observed, albeit with differences in species numbers (Fig. 3b).Only ZPT024 was markedly different in terms of community composition, and this was consistent with the stark dissimilarity observed with nMDS plots (Fig. S1).When MOTUs were ranked by sequencing read counts between sequencing platforms, we found that Kendall's τ was significantly positive for 30 samples (min: 0.484; max: 0.986; p-value < < 0.05; Table S2), which suggested a positive correlation in MOTU rank abundance between both sequencing platforms.Kendall's τ was also positive for ZPT024 (0.478), but the p-value was insignificant.This meant that if a MOTU was found to be abundant in one sample for one sequencing dataset, it would be highly likely to be abundant in the alternative platform as well.This assessment corroborated with the high pairwise Bray-Curtis similarity observed between samples across both sequencing platforms (Fig. S2), since the metric took into account read count data.This further demonstrated that nanopore metabarcoding could reliably and consistently recover abundant MOTUs; this was similarly corroborated by [28], even though our bioinformatic pipelines differed.

Nanopore metabarcode quality
We found that ~ 98% of the raw nanopore reads were erroneous when mapped to their respective Illumina samples, with a mean error rate of 4.20% (Fig. S3 and Table S3).This was consistent with the 4% error rate reported by Gunter et al. [47] for R10.3 flow cell chemistry.After consensus calling with amplicon_sorter however, and without further polishing with medaka, the percentage of consensus sequences per sample that remained erroneous dropped to 0-50.0%(average 24.0%), and error rates correspondingly decreased to 0-1.18% (average 0.40%) (Fig. S3 and Table S3).
Furthermore, for the 444 MOTUs shared between Illumina and nanopore, nanopore sequences from 406 MOTUs (91.4%) did not have indel errors when compared to the same MOTU's Illumina sequences (Table S4).For the remaining 38 MOTUs: 22 of them had nanopore sequences with 1 indel-error, five with 2 indel errors.The rest had three or more indel errors, but this only affected 11 MOTUs.Since our Illumina sequences were already confirmed to be translatable, it in turn confirmed that 91.4% of the nanopore consensus sequences were free of any frameshift errors, and thus translatable as well.

Nanopore sequencing with time
We subsampled the fast5 reads of each run for every hour for the first three hours, and every three hours thereafter to investigate the relationship of (i) number of raw reads, (ii) number of demultiplexed reads, and (iii) number of metazoan MOTUs obtained over time.Although the number of samples differed between runs, both runs showed a similar trend in that all three variables increased at a decreasing rate over time (Fig. 4).Raw  S5).Beyond that, however, further increase in reads did not translate to substantial increase in metazoan MOTUs.

Discussion
Using a set of zooplankton samples as our case study, we performed nanopore-based metabarcoding using ONT's MinION sequencer, and processed the reads with ampli-con_sorter to show that nanopore metabarcodes are comparable to Illumina-based metabarcoding, and ready to be incorporated into more projects.Our study is also the first to emphasise that nanopore metabarcodes are nearly indel-free-an aspect that remains unexamined in past studies.We do note that nanopore metabarcoding is not perfect, and so the strengths and weaknesses of nanopore metabarcoding with amplicon_sorter are discussed below.

Nanopore metabarcodes are highly accurate and virtually indel-free
It is now possible to achieve highly accurate nanopore consensus metabarcodes with amplicon_sorter.In our case, nanopore consensus metabarcodes were observed to be ~ 99.6% accurate when benchmarked against their respective Illumina samples.We note this to be slightly better than the median 99.3% sequencing accuracy  observed by Baloğlu et al. [20], which could be due to our use of the R10.3 sequencing chemistry and SUP base calling model.Furthermore, amplicon_sorter generated consensus metabarcodes that did not require further polishing, mirroring an observation made by Srivathsan et al. [49], and more recently by Wick (https://rrwick.github.io/2023/12/18/ont-only-accuracy-update.html) with the most updated sequencing chemistry and base calling models.This is in contrast to prior nanopore metabarcoding pipelines that always included a polishing step, e.g., Egeter et al. [27] polished their sequences with RACON, while decona [45] incorporated medaka for polishing.We observed only a negligible 0.02% improvement in error rates for our nanopore metabarcodes after polishing, which corroborates Wick's findings that polishing is no longer needed (https://rrwick.github.io/2023/12/18/ont-only-accuracy-update.html).This is advantageous as it saves on time and computational resources, because each consensus sequence has to be polished individually when running medaka.For our dataset al.one, nearly 4,000 instances of medaka were performed, and this is unlikely to scale well computationally for more diverse, or larger-scale metabarcoding projects, where the number of consensus sequences obtained are expected to increase.An added advantage was that almost all our unpolished nanopore metabarcodes were indel-free (91.4%) when compared to their Illumina counterparts, with nearly all of the 38 remaining nanopore sequences having only 1-2 indel errors.Existing nanopore metabarcoding benchmarking studies typically investigate sequencing accuracy [20], and unfortunately do not report gap errors, making it difficult for a direct comparison with our findings.Nevertheless, our workflow presents an improvement over existing pipelines like decona or MSI, as initial tests with our same dataset suggested that polishing programs like RACON and medaka did not greatly improve error rates, and that most nanopore metabarcodes still contained indel-errors.Our validation that nanopore metabarcodes are almost always indel-free means that nanopore metabarcodes can now be subjected to translation checks without error, which would boost the quality of nanopore metabarcodes.Lastly, we were able to achieve clustering and error-correction with just amplicon_sorter alone, and with a single command, which simplifies the analysis workflow.

Lower MOTU richness with nanopore metabarcoding than Illumina
While we have demonstrated that nanopore metabarcoding generated metabarcodes with Illumina-like quality, we recognise that it yielded certain differences in other aspects when benchmarked against Illumina.The most notable difference was in MOTU richness, where we obtained 589 Illumina MOTUs, compared to 471 nanopore MOTUs, with 444 MOTUs shared across both platforms (72% congruence) (Fig. 1b, insert).This was corroborated by a significant difference from the paired Wilcoxon signed-rank test (Fig. 1d).
Based on our Kendall's τ analysis, MOTUs present in Illumina, but missing in nanopore, were MOTUs that generally had very low read depth.This means that MOTUs missed by nanopore sequencing were rarer in the community.The simplest explanation would be that MOTU differences were a consequence of sequencing effort between platforms, or even stochasticity in the adapter ligation efficiency during respective Illumina and nanopore library preparation steps, but these are oftentimes difficult to account for.We also investigated two potential reasons relating to amplicon_sorter to assess if the MOTU differences could also be program-related.
The first reason was resolution limits of amplicon_ sorter, presently at 95-96% [48].This means that closelyrelated species, with less than 4% variance in the COI sequence, will be grouped together by amplicon_sorter, resulting in a lower number of MOTUs obtained.This was challenging to determine as our zooplankton samples were not mock communities, and we did not have prior knowledge of closely-related species groups that we could use to evaluate the resolution limits.We screened ZPT024 and ZPT034-samples that had the lowest Jaccard similarity coefficients between Illumina and nanopore.We first searched for a MOTU that was detected in both Illumina and nanopore for that sample, and then checked if there were any congenerics found in Illumina but not in nanopore (we assumed that congenerics had a higher likelihood of being closely-related compared to other taxonomic ranks).We then checked if the pairwise p-distance between these sequences differed by ≤ 4%, but since we did not encounter any such instance, we do not think that the resolution limit of amplicon_sorter was the main contributing factor for differences in MOTU richness for our study.We emphasise that future users pay special heed to this resolution limit when selecting metabarcoding loci.For instance, zooplankton metabarcoding studies have used hypervariable regions in nuclear 18 S rRNA [83][84][85], nuclear 28 S rRNA [86], and mitochondrial 16 S rRNA [87] in addition to COI [88][89][90][91].The chosen loci must be divergent enough so that the species groups would not be over-collapsed by amplicon_sorter.
The last potential cause for difference in MOTU richness was based on the observation that since ampli-con_sorter grouped only ~ 57% of the reads on average for consensus calling, we checked if the MOTUs unique to Illumina could be found in the unsorted nanopore reads.We mapped the ungrouped nanopore reads to the unique Illumina MOTUs with mapPacBio.sh(see Methods), and found that had amplicon_sorter incorporated these reads, 22 ZPT samples would have had a complete overlap with the MOTUs detected by Illumina sequencing.The remaining 10 samples would mostly still lack 1-2 MOTU(s), with only ZPT008 and ZPT049 missing four or five MOTUs respectively.We further found that the unsorted nanopore reads had a comparatively higher total error rate of ~ 4.52%, above the distance or length thresholds for forming and grouping clusters.This implied that bioinformatic processing of reads by ampli-con_sorter was the more likely reason for the MOTU difference.Further tests however, are needed to better optimise consensus calling settings with amplicon_sorter.
In any case, we note that the aforementioned limitations of amplicon_sorter will not pose a major issue to future metabarcoding projects, given that ONT is continuously updating its flow cell chemistry and basecalling algorithms.Its most recent pivot to R10.4.1 flow cell version and v14 kit chemistry (SQK-LSK114) offers Q20 + raw read accuracy (i.e., 1 in 100 error rate).Potential implications would most certainly be higher-quality raw reads that allow for more precise formation and merging of species groups by amplicon_sorter, which in turn will likely improve the resolution limits of the algorithm.For instance, Ni et al. [92] and Sereika et al. [93] have reported ~ 99.1% modal raw read accuracy when using the latest R10.4 sequencing chemistry-a considerable improvement compared to the v9 + R10.3 sequencing chemistry we used.In addition, with ONT's latest duplex basecalling capabilities, ~ 99% accurate, Q30 + raw reads for metabarcoding are fast becoming a reality [18].It is thus quite foreseeable that the limiting factors of ampli-con_sorter will resolve as nanopore read quality improves with time.

Nanopore metabarcoding costs and turnaround times
Various studies have compared sequencing costs between nanopore and Illumina for metabarcoding, and it is generally agreed upon that nanopore metabarcoding with the MinION is generally cheaper than Illumina MiSeq (28,29).We reduced reagent costs further by adopting a single-PCR tagging strategy, where each of our PCR primers were tagged on 5'-end with 13-bp tags [49].This enabled us to pool multiple PCR replicates into just two pools for nanopore library preparation without further need to barcode them.The only downside was that it required a separate software (e.g., ONTbarcoder) rather than Guppy for sample demultiplexing.However, the single PCR-tagging saved us processing time because the tagging occurred during thermocycling rather than as an additional step in the library preparation process (thermocycling runs for the same length of time regardless whether tagging is performed).The general utility of tagged-PCR primers also means that it can be used for other DNA sequencing projects [50,64,81], and even for Illumina sequencing (like in this study).
Another attractive property of nanopore sequencing is its ability to sequence in real-time.Users can terminate the run when their sequencing needs have been met, wash the flow cell and even recycle it for future use.We were thus interested to know if there was a "sweet-spot" for MOTU richness obtained in relation to sequencing run time for metabarcoding sequencing, based on the observation that up to 90% of DNA barcodes were obtained within the first few hours [49].Our preliminary examination from subsampling nanopore reads with time was that both runs reached ~ 85% of the final MOTU count in under 12 h and 15 h for RUNs A and B respectively (Fig. 4 and Table S5), and sequencing beyond that did not lead to a substantial increase in the number of metazoan MOTUs recovered.We recognise that the relationship between run time and MOTUs recovered is not immediately clear for nanopore metabarcoding (vis-à-vis DNA barcoding).Metabarcoding is likely to be more sensitive to factors such as the number of samples pooled into one flow cell, flow cell health (different flow cells may start with different number of pores available for sequencing) and even pore occupancy (percentage of pores actively sequencing).More tests on the number of metabarcoding samples that can be comfortably multiplexed onto a MinION flow cell without compromising recovered MOTU diversity are needed.What was clear however, was that turnaround times were much faster; it took us three days to complete both nanopore runs (we ran RUN A and B consecutively), in contrast to outsourcing Illumina MiSeq sequencing, which would take 2-4 weeks at the very least.Researchers have even taken advantage of this quicker turnaround time in timesensitive situations such as disease surveillance [94].Even for zooplankton biomonitoring, where sampling intervals can be as often as every two weeks [95], a nanoporebased metabarcoding approach would enable a quicker generation of results that make proposed routine biomonitoring strategies like Song et al. [96] more operationally feasible.

Nanopore metabarcoding for community characterisation
From an operational perspective, we have demonstrated that nanopore-based metabarcoding is viable when benchmarked against Illumina sequencing.Our nanopore metabarcodes were virtually Illumina-like, even with (soon-to-be-obsolete) v9 library preparation kits and R10.3 MinION flow cells.This is only going to improve moving forward, and it is time to relinquish the perception that nanopore sequencing produces highly erroneous reads.Even though there were differences between sequencing platforms, we ultimately found that the same ecological conclusions were obtained regardless-that our zooplankton communities were structured by date, site and fraction, and using a different sequencer was not a significant factor in explaining zooplankton community dissimilarities.Even the relative abundance of MOTUs was fairly consistent across sequencing platforms (88% congruence) and both sequencers successfully recovered 10 metazoan phyla.This also means that future users can employ nanopore sequencing for community metabarcoding with the confidence that their results will be consistent with Illumina, with the potential to leverage the cost-effectiveness, portability and real-time advantages that nanopore sequencing brings.For example, some studies have already incorporated insitu nanopore metabarcoding on board marine vessels [23,26], and we believe more will follow suit in future, especially in the field of plankton monitoring.We did observe however, that amplicon_sorter was less likely to recover rarer MOTUs in the community compared to Illumina.Hence, users who wish to detect rarer species with degenerate primer sets will have to go with conventional Illumina sequencing in order to increase the chances of detection.We do believe this drawback can be soon addressed given that the latest and most accurate R10.4.1 sequencing chemistry is already available, and there are an increasing number of promising reports regarding its use [18,60,92,93].Further benchmarking studies will be needed to investigate how these improvements impact metabarcoding.

Conclusions
DNA metabarcoding is a powerful technique that can be harnessed to generate numerous sequence reads in parallel for multi-species identification and much more.Presently, DNA metabarcoding is conducted using second generation sequencing mainstays like Illumina, and less so on third-generation sequencers like ONT's MinION sequencer.We surmised that this was likely due to the notoriously high error rates of nanopore reads, as well as the general lack of specialised programs that can process such erroneous reads.Existing nanopore metabarcoding workflows either incorporate complicated and timeconsuming laboratory steps, or require custom reference databases, or additional polishing steps, which perhaps disincentives the use of nanopore sequencing for metabarcoding.However, recent improvements in nanopore read accuracy in conjunction with new bioinformatic pipelines have led us to posit that nanopore sequencing can now produce highly-accurate metabarcoding results that are consistent with conventional Illumina sequencing, and without the need to polish the sequences unlike in the past.We demonstrated this by metabarcoding 34 bulk zooplankton communities on two R10.3 MinION flow cells, and processed the reads with amplicon_sorter.Our results showed that: [1] nanopore metabarcodes are nearly Illumina-like in sequencing accuracy (99.6%) and are almost always indel-free (91.4%); [2] relative abundance of MOTUs were congruent (88%) across both platforms, and nanopore recovered the abundant MOTUs just as well as Illumina but struggled to capture the rarer taxa; and that [3] ecological conclusions were consistent across sequencing platforms when metabarcoding zooplankton communities despite some differences in species richness recovered.Reports of the newly released R10.4.1 sequencing chemistry already indicate vast improvements in the quality of nanopore sequences.We are confident that our results will inspire greater assurance in the utility of nanopore technology for more, and perhaps even larger-scale, metabarcoding-related projects in the near future.

Fig. 1
Fig. 1 Sequencing statistics of zooplankton metabarcoding with Illumina MiSeq and Nanopore MinION.(a) Bar plot of sequence reads demultiplexed per sample per sequencing dataset.(b) Species accumulation curves of molecular operational taxonomic unit (MOTU) richness for each sequencing platform against the number of samples, extrapolated to visualise number of samples needed to capture maximum richness; number of MOTUs obtained (and shared) expressed in Venn (insert).(c) Bar plots showing the number of MOTUs obtained per sample per sequencing type.(d) Box plots comparing MOTU richness, Simpson index, and Shannon-Weiner index between sequencing platforms; asterisks indicate significant differences for paired Wilcoxon signedrank tests, and dots represent individual sample points (jittered)

Fig. 2
Fig. 2 Two-dimensional nonmetric multidimensional scaling (nMDS) plots based on normalised Bray-Curtis distances for Illumina (a, c, e, g), and nanopore (b, d, f, h); coloured by sampling method (a and b), sampling site (c and d), date (e and f), and size fraction (g and h).ZPT024 was removed from the nanopore dataset to better visualise the spread of points; it was similarly distinct from the remaining samples for both sequencing types

Fig. 3
Fig. 3 Bar plots showing the relative proportions of molecular operational taxonomic units (MOTUs, grouped by phylum) by sequencing type (a), and by sample (b)

Fig. 4
Fig. 4 Line graphs showing the change in number of raw reads (green), demultiplexed reads (orange) and metazoan MOTUs (purple) with sequencing run time, for RUN A (left) and RUN B (right)

Table 1
Permutational multivariate analysis of variance (PERMANOVA) results comparing community differences between nanopore and Illumina metabarcoding datasets, with Jaccard coefficient and bray-Curtis dissimilarity.Variables with significant p-values are highlighted in bold reads and demultiplexed reads both increased proportionately with respect to each other, with both variables only starting to plateau near the end of the respective runs.Conversely, metazoan MOTUs largely stabilised by the midway mark of each run, with RUN A and B obtaining 85% of the final MOTU count by the 12-and 15-hour mark respectively (Table

Table 2
Permutational multivariate analysis of variance (PERMANOVA) results comparing Bongo net communities for nanopore and Illumina datasets, using Bray-Curtis dissimilarity.Variables with significant p-values are highlighted in bold